Lexical Semantics Annotation for Enriched Portuguese Corpora
نویسندگان
چکیده
The semantic annotation of corpora has an important role to play in ensuring that sentences occurring in natural language texts are correctly understood based on their intended context. Two examples of lexical semantic units that contribute to this knowledge are word senses – which allow words with multiple meanings to be understood based on the context in which they are used – and named entities – which can be disambiguated and linked back to the specific encyclopedic resources that describe them. In this paper, we describe the construction of lexical semanticallyannotated corpora for Portuguese, annotated with both word senses linked to senses in a Portuguese wordnet and named entities linked to Portuguese Wikipedia entries using DBpedia. The result is a goldstandard lexical semantically-annotated resource that is useful in supporting the training and evaluation of tools for the disambiguation of these lexical units in Portuguese.
منابع مشابه
The Hinoki Sensebank - A Large-Scale Word Sense Tagged Corpus Of Japanese
While there has been considerable research on both structural annotation (such as the Penn Treebank (Taylor et al., 2003) or the Kyoto Corpus (Kurohashi and Nagao, 2003)) and semantic annotation (e.g. Senseval: Kilgariff and Rosenzweig, 2000; Shirai, 2002), there are almost no corpora that combine both. This makes it difficult to carry out research on the interaction between syntax and semantic...
متن کاملVerbLexPor: a lexical resource with semantic roles for Portuguese
This paper presents a lexical resource developed for Portuguese. The resource contains sentences annotated with semantic roles. The sentences were extracted from two domains: Cardiology research papers and newspaper articles. Both corpora were analyzed with the PALAVRAS parser and subsequently processed with a subcategorization frames extractor, so that each sentence that contained at least one...
متن کاملThe Limits of Using FrameNet Frames to Build a Legal Ontology
FrameNet frames have been used to develop lexical databases and annotated corpora for different languages. This paper analyses the use of FrameNet frames to build a legal ontology for the Brazilian Law. In order to discuss the problems of such approach to ontology development, the lexical units evoking the Criminal_process frame were contrasted in English and Portuguese. Frame divergence betwee...
متن کاملCorpus-based Induction of a Frame Semantics Projection for LFG
In computational linguistics there is growing insight that high-quality NLP applications for information access (question anwering, etc.) are in need of deeper linguistic analysis, in particular, semantic analysis. A bottleneck for semantic processing is the lack of large-scale domain-independent lexical semantic resources. While WordNets for several languages are important lexical resources fo...
متن کاملRetrieving Lexical Semantics from Multilingual Corpora
This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual ‘synsets’. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for E...
متن کامل